---
title: "Reference: voice.listen() | Voice Providers | Kastrax Docs"
description: "Documentation for the listen() method available in all Kastrax voice providers, which converts speech to text."
---

# voice.listen() ✅

The `listen()` method is a core function available in all Kastrax voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.

## Usage Example ✅

```typescript
import { OpenAIVoice } from "@kastrax/voice-openai";
import { getMicrophoneStream } from "@kastrax/node-audio";
import { createReadStream } from "fs";
import path from "path";

// Initialize a voice provider
const voice = new OpenAIVoice({
  listeningModel: {
    name: "whisper-1",
    apiKey: process.env.OPENAI_API_KEY,
  },
});

// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
  filetype: "mp3",
});
console.log("Transcribed text:", transcript);

// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);

// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
  language: "en",
  prompt: "This is a conversation about artificial intelligence.",
});
```

## Parameters ✅

<PropertiesTable
  content={[
    {
      name: "audioStream",
      type: "NodeJS.ReadableStream",
      description: "Audio stream to transcribe. This can be a file stream or a microphone stream.",
      isOptional: false,
    },
    {
      name: "options",
      type: "object",
      description: "Provider-specific options for speech recognition",
      isOptional: true,
    }
  ]}
/>

## Return Value ✅

Returns one of the following:
- `Promise<string>`: A promise that resolves to the transcribed text
- `Promise<NodeJS.ReadableStream>`: A promise that resolves to a stream of transcribed text (for streaming transcription)
- `Promise<void>`: For real-time providers that emit 'writing' events instead of returning text directly

## Provider-Specific Options ✅

Each voice provider may support additional options specific to their implementation. Here are some examples:

### OpenAI

<PropertiesTable
  content={[
    {
      name: "options.filetype",
      type: "string",
      description: "Audio file format (e.g., 'mp3', 'wav', 'm4a')",
      isOptional: true,
      defaultValue: "'mp3'",
    },
    {
      name: "options.prompt",
      type: "string",
      description: "Text to guide the model's transcription",
      isOptional: true,
    },
    {
      name: "options.language",
      type: "string",
      description: "Language code (e.g., 'en', 'fr', 'de')",
      isOptional: true,
    }
  ]}
/>

### Google

<PropertiesTable
  content={[
    {
      name: "options.stream",
      type: "boolean",
      description: "Whether to use streaming recognition",
      isOptional: true,
      defaultValue: "false",
    },
    {
      name: "options.config",
      type: "object",
      description: "Recognition configuration from Google Cloud Speech-to-Text API",
      isOptional: true,
      defaultValue: "{ encoding: 'LINEAR16', languageCode: 'en-US' }",
    }
  ]}
/>

### Deepgram

<PropertiesTable
  content={[
    {
      name: "options.model",
      type: "string",
      description: "Deepgram model to use for transcription",
      isOptional: true,
      defaultValue: "'nova-2'",
    },
    {
      name: "options.language",
      type: "string",
      description: "Language code for transcription",
      isOptional: true,
      defaultValue: "'en'",
    }
  ]}
/>

## Realtime Voice Providers ✅

When using realtime voice providers like `OpenAIRealtimeVoice`, the `listen()` method behaves differently:

- Instead of returning transcribed text, it emits 'writing' events with the transcribed text
- You need to register an event listener to receive the transcription

```typescript
import { OpenAIRealtimeVoice } from "@kastrax/voice-openai-realtime";
import { getMicrophoneStream } from "@kastrax/node-audio";

const voice = new OpenAIRealtimeVoice();
await voice.connect();

// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
  console.log(`${role}: ${text}`);
});

// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);
```

## Using with CompositeVoice ✅

When using `CompositeVoice`, the `listen()` method delegates to the configured listening provider:

```typescript
import { CompositeVoice } from "@kastrax/core/voice";
import { OpenAIVoice } from "@kastrax/voice-openai";
import { PlayAIVoice } from "@kastrax/voice-playai";

const voice = new CompositeVoice({
  listenProvider: new OpenAIVoice(),
  speakProvider: new PlayAIVoice(),
});

// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);
```

## Notes ✅

- Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify)
- The behavior of `listen()` may vary slightly between providers, but all implementations follow the same basic interface
- When using a realtime voice provider, the method might not return text directly but instead emit a 'writing' event
- The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A
- Some providers support streaming transcription, where text is returned as it's transcribed
- For best performance, consider closing or ending the audio stream when you're done with it

## Related Methods ✅

- [voice.speak()](./voice.speak) - Converts text to speech
- [voice.send()](./voice.send) - Sends audio data to the voice provider in real-time
- [voice.on()](./voice.on) - Registers an event listener for voice events
